Research on predicting the driving forces of digital transformation in Chinese media companies based on machine learning

Chinese media companies are facing opportunities and challenges brought about by digital transformation. Media economics takes the evaluation of the business results of media companies as the main research topic. However, overcoming the internal differences in the industry and comprehensively predicting the digital transformation of Chinese media companies from multiple dimensions has become an important issue to be understood. Based on the “TOE-I” theoretical framework, this study innovatively uses machine learning methods to predict the digital transformation of Chinese media companies and to analyze specific modes of the main driving factors affecting the digital transformation, using data from China’s A-share-listed media companies from 2010 to 2020. The study found that environmental drivers can most effectively and accurately predict the digital transformation of Chinese media companies. Therefore, under sustained and stable economic and financial policies, guiding inter-industry competition and providing balanced digital infrastructure conditions are keys to bridging internal barriers in the media industry and promoting digital transformation. In the process of transformation from traditional content to digital production, media companies should focus on policy changes, economic benefits, the decision-making role of core managers, and the training and preservation of digital technology talent.


Literature review The influencing factors of enterprise digital transformation
Current relevant research into enterprises' digital transformation largely focuses on two perspectives: the driving and hindering factors of digital transformation, and the impact of digital transformation on all aspects of the enterprise.Regarding the drivers of digital transformation, Verhoef et al. 13 focus on the response of enterprises to the changes in digital technology, increased digital competition, and the consequential digitalized customer behavior.While emerging digital technologies reduce labor costs, competition among companies intensifies and consumer preferences change accordingly.Yan et al. 14 found that mixed ownership reform is the catalyst for digital transformation and is also the key driving force for the sustainable development of China's state-owned enterprises.In addition, digitalization has become an important factor affecting the decision-making processes of entrepreneurs, and digital strategy is an important part of the corporate strategy of enterprises 15 .Regarding the factors hindering the digital transformation of enterprises, Roman and Rusu 16 found that a lack of technology and capital negatively influences the digital transformation of enterprises, and that digital infrastructure has become an external factor affecting the digital transformation of enterprises.Looking at the impact of digital transformation, there is a focus on the relationship between digital transformation and the performance of enterprises 17 .Driven by the pursuit of profits, digital transformation has promoted the financialization of enterprises, especially among companies with poor internal and external governance 18 .The digital transformation of enterprises significantly promotes mergers and acquisitions by reducing internal organizational costs and is more significant among private enterprises 19 .

www.nature.com/scientificreports/
There are also many studies on digital transformation in different industries and different types of enterprises.For instance, Lange et al. 20 collected data from semi-structured interviews and concluded capital to be a driver for digital start-ups in massive and rapid business scaling (MRBS).Roman and Rusu 16 established an econometricbased model and highlighted the relationship between the performance of SMEs and digital transformation indicators.Ardolino et al. 21focused on how digital capabilities (IoT, cloud computing, and predictive analytics) support the service transformation of industrial enterprises.

Application of machine learning in the field of media economics
Media economics is a discipline at the intersection of economics, management, and communication.It has shifted from the traditional media industry represented by the printing, television, and film industries to the new media research period with the Internet, digital platforms, and mobile communication media as the main focus 22 .The study of media economics under the corporate paradigm, and the evaluation of the operational results of media organizations has always been an issue.To evaluate business performance, Huang 23 put forward an evaluation system to measure the financialization level of a media company, including the index of the ownership structure, shareholder value, financial asset holding ratio, and financial investment rate.Sheng et al. 24 established an evaluation system on the performance of media organizations' mergers and acquisitions to evaluate the value-creation ability after mergers and acquisitions.Xie and Li 5 looked at the evaluation of the competitiveness of listed media companies during the big data era.
At the core of artificial intelligence technology, machine learning technology has been widely used in the field of journalism and communication; for example, in the mode reformation of content production 25,26 , the prediction and discovery of social media trends, and the emotional analysis of users [27][28][29] .These methods are occasionally used in predicting and analyzing the operation and development of media companies.Pan and Wang 6 used machine learning methods to conduct text analysis of the annual report information of media companies to identify the relationship between digital transformation and the value of cultural enterprises.Shi and Wang 30 focused on the advertising industry, combining artificial neural network (ANN) algorithms to achieve intelligent evaluation and predictive analysis of advertising publishing and click results, and to optimize the resource utilization efficiency of the advertising industry.Sun et al. 31 used text mining and natural language processing (NLP) technology to conduct an emotional analysis on negative reports on the operation and financial status of media companies and established a warning mechanism for adverse impact on financial status.
In summary, this study has found that in the field of media economics, research is focused on macro perspective industry characteristics such as the operation and management of integrated media and the development and operation models of new media formats.From the perspective of micro market entities, however, there is still a lack of theoretical and empirical research on the transmission path to the industry change brought by digital transformation within media companies.Analysis and research on the driving factors of the implementation of digital transformation strategies in media companies through the predictive ability of machine learning is rare.This research aims to find indicators to measure the degree of digital transformation in the media industry and applies the machine learning method to identify the key elements driving the digital transformation of the media industry in China.

The "TOE-I" prediction model
The theoretical framework of TOE (Technology, Organization, Environment) was first proposed by Tornatzky and Fleischer to comprehensively study and analyze the influencing factors that may cause interference when enterprises adopt innovative technologies 32 .At the technology level, the TOE framework considers the influence of the internal technical level and technical support-related factors within an organization; that is, whether the enterprise can apply existing technologies, which is the basis for enterprises to adopt innovative technology 33,34 .At the organizational level, to achieve the future application of innovative technologies within the organization, the focus is on the composition of specialties and responsibilities of organizational personnel at different levels 35,36 .Environmental factors represent the macro external characteristics of the specific environment where the organization operates, such as government policies 37 , competitive pressure, and the business environment 38 .The TOE framework systematically considers technology and organizational factors both inside and outside organizations so it has strong systematization and operability.
However, industrial segmentation in different industries faces various digital transformation challenges in the digital era.Some scholars have proposed that to determine the specific factors in these three backgrounds and establish the potential relationships between these factors, the TOE framework serves as a basic framework to integrate other relative elements 33 .Influenced by technological changes, media has changed dramatically in a short period of time.From publishing to radio and television then to the Internet and marketing on a digital platform, there is a big gap between the business and the produced content of the media industry.Therefore, based on the characteristics of Chinese media companies, this study takes "I"-Industry as an important dimension to predict their digital transformation and integrates it into the TOE framework.
Based on existing literature practices [10][11][12] , this study divides the driving force of the digital transformation of Chinese media companies into the following four dimensions: (1) Technical driving force.Digital technology is the basis of the digitalization of the media industry and should be applied in all fields, particularly the production and operation process of the media industry 10,12 .The technical upgrading of enterprises is directly reflected in the investment in technical research and development and in the scale of technical personnel 39,40 .(2) Organizational driving force.The heterogeneity of corporate internal governance subjects, such as the characteristics of senior executives, enterprise organization, and governance structure will lead to different behavior in digital transformation among media companies 41 .Based on previous research 6,7 , the characteristics of the organizational driving force include the size, knowledge level, and social resources of the senior management team, as well as the revenue ability, debt repayment ability and continuous growth ability.(3) Environmental driving force.In the tide of media globalization, Chinese media companies represented by advertising and games expand their overseas markets, while media companies such as radio, television, and publishing (combining social and corporate attributes) are influenced by policy.Therefore, this study includes the opening rate of monetary policy, financial support, and competition pressure in the industry, as well as the level of protection of intellectual property by local government as environmental driving factors.(4) Industrial driving force.Publishing, radio and television, advertising and film, games and digital media face different industry bases and characteristics in the digital transformation.Mainstream media, represented by publishing and radio and television, actively develop new forms and content based on new media platforms.They also undertake political tasks in guiding public opinion and "narrating Chinese stories well" 3,4 .Big data and intelligent algorithms have continued to erode the boundaries of the traditional advertising industry, causing collective concerns in the advertising industry 42 .Within the industrial driving force dimension, China's listed media companies are subdivided into six industries of games, advertising marketing, film and television cinema, digital media, publishing, and television broadcasting in predicting the driving force of the industry.
This article therefore aims to use the TOE model and takes "Industry" as one of the influencing factors based on the particularity of the Chinese media industry to explain why Chinese media companies conduct digital transformation, thereby filling the theoretical gap in the interaction mechanism between companies and industry characteristics in the Chinese media industry.To obtain a more accurate model, the study chose machine learning methods.Based on the practices of other research, we innovatively used ensemble learning models other than text analysis, such as Random Forest Regression (RFR) and Gradient Boosting Regression (GBR) models to expand the application of machine learning methods in the media field.

Research methods
This study uses an integrated machine learning method to construct and integrate multiple base learners to achieve more accurate prediction effects than using a single one.According to the degree of independence among the base learners, the method of Nie et al. 43 and Parzinger et al. 44 selected the Gradient Boosting Regression (GBR) and Random Forest Regression (RFR) in serial and parallelization methods, then compared them with multiple linear regression and LASSO in a linear research method.The integrated machine learning method effectively illustrates the non-linear relationships and interactions between the variables in the linear relationship, so that it performs well in out-of-sample prediction tasks 8 .Therefore, this study predicts that integrated machine learning methods will outperform linear research methods in predicting the degree of digital transformation of media companies.

Model design
The model performance is examined from two perspectives: the model interpretation ability and the prediction error.In terms of model interpretation ability, Chen et al. 45 and Ghazwani and Begum 46 illustrate that ensemble learning can adjust itself based on the deviation between the model fitting value and the observation value in the previous calculation and can self-check the accuracy of the model.Therefore, we believe that the difference between the estimated values of the model and the observations can be used as a standard to evaluate the interpretation ability of the prediction model.The following two indicators are used: (1) Intra-sample goodness of fit ( R 2 Is ) to evaluate the fitting effect of each research method on the training set sample.With the higher within-sample goodness of fit, the model is also more interpretable to the training set samples.(2) Out-of-sample goodness of fit ( R 2 oos ) to measure the universality of the model.In addition, this study measures the generalization ability of the model from the perspective of variance, and chooses the explanatory variance to measure the dispersion of the actual value (EVS oos ).
In terms of model prediction error, we followed the practice of Chen et al. 47 in selecting the out-of-sample mean variance ( MSE oos ) to investigate the deviation degree between the predicted and actual value.The out-of- sample mean square error is positively correlated with the accuracy of the model prediction.To avoid the large deviation value in the test set, which leads to estimated mean square error inconsistency, the average absolute error ( MAE oos ) and absolute median (MedAE oos ) differences were used to evaluate the accuracy of the model prediction.The specific methods of each evaluation index are shown in Table 1.www.nature.com/scientificreports/ In interpreting the model results, the integrated machine learning method includes multiple learners so it cannot be directly explained as much as a single learner 48 .To solve this problem, we used a relative importance and partial dependency graph to interpret the practical significance in the ensemble machine learning model.Relative importance refers to the degree of importance of one variable relative to the others in the process of model fitting.According to the method of Supsermpol et al. 49 , the relative importance of the variable can be assessed by measuring the decrease of the variable after its introduction.If the relative importance of a variable is high, it has a stronger influence in predicting the digital transformation of media companies.The partial dependency graph illustrates measurement of the influence of the changing degree of a certain variable on the digital transformation of a media company, assuming that other features are unchanged.Moreover, it is displayed in the form of images, which have more visual features.The single variable is more accurate in predicting the degree of digital transformation of media companies 50 .

Data sources and variable definitions
Data source This study selected media companies listed on A-shares in 2010-2020 as the initial sample, with company data from Wind and CSMAR databases.To exclude the interference of any special observation samples to the prediction results, the data were processed as follows: (1) exclusion of enterprises with ST, PT, and other abnormal listing status to avoid interference with the overall prediction effect due to abnormal operation of the enterprise itself; (2) elimination of samples with missing data; and (3) continuous variables in the data were treated by 1% and 99% quantile to avoid extreme outlier interference.A final set of 395 observations were obtained.The classification of the media industry adopts the 2021 SHENYIN&WANGUO classification method in the CSMAR database.

Variable definition
The digital transformation index (Digitaltransindex) in the CSMAR database was selected as the response variable.According to the CSMAR variable, the response variable using the annual report of enterprise digital transformation-related word frequency statistics can effectively reflect the enterprise digital transformation and transformation degree.It is divided into five parts: artificial intelligence (AI), blockchain (BD), cloud computing (CC), big data (BD) and the application of digital technology level (ADT).Table 2 shows the detailed calculations.
Based on the existing research of digital transformation drivers, this study selected the driving force characteristics of the model from the following four dimensions, as shown in Fig. 1: This study draws on Yang and Xu 39 and Li 9 to select the intensity of R&D expenses and the proportion of technical personnel as the measurement indicators of innovation ability and absorption ability, as shown in Table 3.
In terms of organizational dimensions, there are two main factors that affect the strategic decisions in a company's digital transformation.The first is the leadership style of the company's CEO and management.The attitude of the executive team towards risk, as well as the decision-making style and decision-making power of the management are closely related to the implementation level of the digital transformation strategy.The second www.nature.com/scientificreports/factor lies in the internal operations and cash flow of the company.The implementation of digital transformation strategy requires a large amount of capital participation so the use and fundraising of internal and external funds of enterprises should be listed as influencing factors.Referring to Bernile et al. 51 , Schoar and Zuo 52 , and Bandiera et al. 53 , Manager Number, Education Level, Social Network, ROA, Growth, TobinQ, Lev, Top Ten Holders' Rate, Duality, and IndDirector Ratio were selected as these variables [51][52][53] .Detailed calculations are shown in Table 4. Also, with reference to Xu et al. 54 , Sun and Zheng 55 , Wu and Ma 56 , this study took Financial Support, Infrastructure Score, Monetary Policy, IP Protection, and HhiD as variables to measure the environmental characteristics of media companies.The above indicators reflect the overall business environment of the media industry and the support from governments in different regions for innovative development in the media industry, as shown in Table 5.
The fourth dimension is the industry classification.According to the revised version of the SHENYIN&WANGUO classification 2021, the media industry is subdivided into six categories: games, advertising and marketing, film and television cinema, digital media, publishing, and TV broadcasting, with a total of 141 listed companies, as shown in Table 6.
Similarly, this study draws on Li et al. 57,58 , Zhao et al. 59 , Hanelt et al. 60 in taking Past Revenue, Cash Flow Ratio, Firm Age, Firm Size, and SOE as benchmark variable groups, as shown in Table 7.

Descriptive statistics
As shown in Table 8, the mean value of the digital transformation index of media companies is 493.9037975, and the standard deviation is 297.7080399,indicating that the degree of digital transformation differs significantly among industries, and the characteristics of other variables have no outliers.

Organization
Manager number Natural logarithm of the total number of managers

Education level
The education level of the senior executive team is measured, that is, the value of other degrees is 1, the college degree is 2, 3, and the graduate degree is 4. The sum of the weight of the senior executive team is divided by the total number of people to obtain the average number of the education level of the senior executive team

Financial support
The ratio of the local financial expenditure on science and technology to the public budget revenue

Infrastructure score
The entropy right method is used to construct the infrastructure application and development indicators supporting the development of digital economy into an infrastructure index, with provincial annual data

Monetary policy
The annual M2 growth rate for that year

IP protection
The ratio of the contract amount of the technology market of each province to the GDP of each province in the current year is divided into provincial annual data

HhiD
The Herfindahl-Hirschman Index of the industry www.nature.com/scientificreports/ The prediction effect of the model constructed based on the machine learning method on the digital transformation index of media companies Table 9 shows the prediction results of the models constructed by different integrated machine learning methods on the degree of digital transformation of media companies.The results in column (1) show that the withinsample goodness of fit of the multiple linear regression and LASSO model is lower than that of the GBR and RFR, indicating that the within-sample fitting effect of the integrated learning method is superior.In addition, the results of columns ( 2) and (3) in Table 9 show that the out-of-sample goodness of fit and interpretable variance of GBR have the highest values, 0.59123809 and 0.55601084, respectively, followed by RFR.Four indicators of both methods are higher than 0.5, indicating that machine learning methods can better predict the degree of digital transformation of media companies.It is clear that in column (4) the out-of-sample mean square error of the GBR and the RFR is smaller than the multiple linear regression and LASSO.Finally, columns ( 5) and ( 6) www.nature.com/scientificreports/indicate that the GBR and RFR have low mean absolute errors, 0.57720771 and 0.58604578, respectively.This indicates that the model improvement effect is not obvious after excluding the deviation value.
In conclusion, the GBR and RFR in the ensemble machine learning method fit better to the data, thus constructing a more accurate prediction research model.This study further discusses the driving forces of the digital transformation of media companies and the key factors.

Differences in the driving force dimensions of media companies' digital transformation prediction ability
To explore the different driving force dimensions of media company digital transformation prediction ability, this study first constructed the listed years (Firm Age) and company size (Size) using control characteristics such as benchmark model calculation and comparison to add different driving force combinations of prediction performance.As the research conclusions obtained based on different evaluation indicators are largely the same, this study analyzes the out-of-sample goodness of fit and the research results are as shown in Table 10.
Firstly, we considered the difference in the ability of single-dimensional drivers to predict the digital transformation of media companies.In comparison with other driving forces, the addition of environmental driving force features to the benchmark model achieves the best prediction effect.Taking RFR as an example, after adding technology, organization, environment, and industry drivers to the benchmark model, the predicted value increased by 92.86%, 100.53%, 145.17%, and 128.04%, respectively.Secondly, we considered differences in the ability of different combinations of drivers to predict the digital transformation of media companies.A combination including environmental driving force dimensions has the best performance: when the two types of driving force characteristics are combined, the benchmark model adds the environmental driving force and the industry driving force can obtain a higher model interpretation ability.When the technology driving force, environmental driving force and industry driving force are added to the benchmark model, the highest model interpretation ability is achieved.The results show that the environment driving force is more accurate in predicting the digital transformation degree, indicating that stable monetary policy, comprehensive infrastructure construction, government financial support for digital transformation, and good industry concentration are key elements in driving the digital transformation of Chinese media companies.

Differential analysis of the prediction ability of digital transformation by key factors under different driving forces
Based on the GBR and RFR, the relative importance of variables in the machine learning model is clear.Figures 2  and 3 report the relative importance ranking of the variables.Table 11 shows the variables ranked in the top 15 of the GBR and RFR methods, indicating that these characteristics are the key elements affecting the digital transformation of Chinese media companies.www.nature.com/scientificreports/

Prediction model of digital transformation of media companies by important driving factors
Following the prediction method of GBR and RFR (The order in Figures 4-7 is as follows), among the many factors that affect the digital transformation of media companies, this study found that monetary policy, industry competition pressure, the proportion of technical size, return on assets of enterprises, age of the listed companies and industry classification have the best effect on predicting the digital transformation of media companies.
Monetary policy.Figure 4 is a partial dependency graph of monetary policy.The agent variable of monetary policy is the growth rate of M2.As shown in the figure, when the growth rate of M2 is less than 12.5%, there is no obvious impact on the degree of digital transformation of enterprises.However, when the growth rate is higher than 12.5%, the degree of digital transformation in enterprises shows a downward trend.Therefore, this study holds that the impact of monetary policy on the digital transformation of enterprises is not monotonous, and  www.nature.com/scientificreports/ that enterprise managers should pay attention to the external environment at all times and adjust the process of the digital transformation of media companies timely.
Industry competition pressure.Figure 5 shows the HhiD of the industry as a tool to measure the level of competition among companies.Industry competition reflects the intensity in the competition for limited resources among companies.When the index is less than 0.05%, its impact on the digital transformation of media companies is significant and monotonous.When the index is 0.05% − 0.3%, the degree of digital transformation of  www.nature.com/scientificreports/media companies slowly decreases under its action, and when it reaches 0.3%, the impact effect is small.This shows that when the HhiD is high, media companies have low entry barriers and stable profit flow, thus enterprises have little or no demand to achieve differentiation in homogeneous competition.
The proportion of technical size.Figure 6 shows a partial dependence diagram of the proportion of technical size.This study selected the proportion of technical size as the agent variable.When the proportion of technical size is less than 10%, its influence on the digital transformation index of media companies shows a rapid increasing trend.When the proportion approaches 20%, the impact on the digital transformation index is highest.Above this, the impact effect is significantly reduced.Therefore, media companies should determine the proportion of technical size according to the requirements of digital transformation.
Return on assets of enterprises.Figure 7 shows a partial dependency graph of the return on assets of an enterprise.As shown, when the ROA of a media company is negative, this indicates problems in the operation of the enterprise.Managers will put more time and energy into business activities, rather than focusing on digital transformation, so the impact is small.When the return on equity of media companies is positive, this shows a temporary trend of rapid increase, followed by stability, indicating that its impact on the intensity of digital transformation is fluctuating.
Age of the listed companies.Figure 8 shows the age of the listed company, used to describe the companies' characteristics in the benchmark variables.The results indicated that when the establishment period of a media company is less than 15 years, the impact on the digital transformation of companies is not significant.When the establishment period of a media company reaches 15 years, the effect on the digital transformation is minimal, and then shows a significant increasing trend until the establishment period reaches 25 years.Media company management experience affects the strength of the digital transformation.When the company age is small, and lacks operating experience, the digital transformation is relatively low.In contrast, established companies with greater ability and resources can effectively support the digital transformation reform process, making full use of information advantage and achieving scale effects.
Industry classification.This research innovatively integrates the dimension of the industry driving force using the theoretical framework of TOE and forms the "TOE-I" model to predict the intensity of digital transformation of Chinese media companies.Firstly, through the two integrated machine learning methods of GBR and RFR, the model is shown to have a significant prediction effect on the advertising industry and the film and television industry, indicating that "TOE-I" can be better applied to the digital transformation prediction of the advertising industry and the film and television industry.Secondly, the prediction ability of radio and television, digital media, and the publishing industry is weak.As traditional mainstream media, radio, television and publishing media extend China's mainstream media to the Internet field, the stronger the political attribute, the stronger the uncertainty of digital transformation, and the more difficult it is to accurately predict using the "TOE-I" model.Thirdly, game companies have usually been established more recently and are based on digital technology, so taking the digital transformation index as the measurement standard, the effect of digital transformation prediction is not significant enough.See Table 12 for details.
Robustness test.First, changing the training set division method.This study used the 8:2 proportion random classification to determine the training set and the test set, which weakens the randomness to some extent.Therefore, the K-fold method was used for further random division in the robustness test.In machine learning, K-fold cross-validation is a common method of model evaluation.It can help us accurately evaluate the performance of machine learning models and provides more reliable results particularly if the data is limited.The validation steps of the K-fold method were as follows: • The original dataset was split randomly into K subsets of similar size, taking K values of 10.
• One subset was selected as the validation set and the remaining K-1 subset as the training set.
• The model was trained using the training set and evaluated on the validation set.
• Steps 2 and 3 were repeated until each subset is used as a validation set.
• The results of K times of evaluation were integrated to obtain the final model evaluation index.
Based on this, K-fold cross validation can be repeated through the process of more stable evaluation results to reduce the contingency caused by different data divisions.For small data sets, K-fold cross validation can better evaluate the performance of the model, reducing the data caused by overfitting or underfitting problems.
As shown in Table 13, after replacing the training and test sets using the K-fold test, the correlation findings are compared to Table 9 with no change.
Second, changing the measure of the intensity of digital transformation.Drawing on Xiao 61 , this study used different entries of enterprise digital transformation strength, eliminated the term "digital technology application"   at the application level, and only retained the terms "artificial intelligence", "blockchain", "cloud computing", and "big data" at the basic digital technology level.Add 1 to the total occurrence frequency and take the natural logarithm as the replacement variable for robustness testing.Using new response variables and refitting the model, the results were consistent with the main test, and the specific tests are shown in Table 14.

Conclusion
Previous research has focused on the correlation between a single factor of a single dimension feature and the digital transformation of media industry, only conducting predictions within the sample, and lacking a comprehensive consideration of the driving forces in the digital transformation of media companies [5][6][7] .In this study, the driving forces of the digital transformation of Chinese media companies are divided into four dimensions: technology, organization, environmental, and industry drivers (i.e., the "TOE-I" model).The purpose of classifying the driving forces of digital transformation is to explore the differences in the prediction ability of media companies concerning digital transformation.This research analyzed the key driving factors of the digital transformation of media companies and the specific mode influenced by the above factors of the digital transformation of the Chinese media industry.
This study innovatively used ensemble learning methods, taking relative importance indicators and partial dependency graphs to help realize the research purpose.By comparing the fitting effect of the combination of different dimensions of driving forces in the benchmark model, we found that the environmental driving force can predict the digital transformation behavior of the Chinese media industry effectively and accurately, showing environmental drivers to be the dominant factor in influencing Chinese media companies' strategies for digital transformation.Compared to linear methods such as multiple linear regression, the ensemble learning method achieved better performance in both model interpretation ability and minimization of prediction error, with the RFR method having the best predictive performance.The driving factors of (1) monetary policy, competition pressure in the industry, and the infrastructure index in the environmental driving force, (2) the equity concentration, enterprise value, executive team knowledge level and social networks in the organizational driving force, and (3) advertising, film and television industries in the industry driving forces all have significant predictive effects on the digital transformation of media industry in China.
Based on the above conclusions, the following policy implications are suggested: (1) Policy makers are supposed to provide stable monetary policies.Media companies like game companies and Internet advertising and marketing companies that think globally should be provided with stable economic and financial policies to facilitate their digital transformation.As shown in Fig. 4, the intensity of digital transformation is highest when the M2 growth rate is 12.5%.Therefore, government managers should provide a stable monetary policy to promote the digital transformation of media companies.Moreover, as demonstrated by the prediction ability of the digital transformation, the dimension of the external environmental driving force has the greatest impact on the digital transformation of media companies.Therefore, in addition to providing a stable monetary policy, competition between industries should be guided to provide the matching infrastructure conditions for digital transformation.(2) Managers should maintain stable profit sources to promote digital transformation in media companies; for example for companies in radio, television, and newspapers with strong political attributes and little focus on income output and income.This study indicates that ensuring a positive cash flow of enterprises is an important driving factor for the digital transformation.Attention should also be paid to the decisionmaking role of core managers in the process of digital transformation.This research adopted two machine learning prediction methods-GBR and RFR that showed enterprise core managers to be an important influence in the prediction factors of China media companies (refer to ranking 2 of GBR and ranking 3 of RFR in Table 11).Thus, to ensure digital transformation, media companies should focus on the decisionmaking role of core managers.Furthermore, media companies should pay attention to the cultivation and preservation of digital technical talent.In the gradual conversion from traditional content and news production to digital, platform-based production, the cultivation and preservation of technical talent are crucial.Figure 6 illustrates that when the proportion of technical personnel in the company is 20%, digital transformation is the highest.Therefore, media companies should recruit digital technical personnel to maintain this level.(3) Within the media industry, it is necessary to seize the opportunity of technological change and pay close attention to policy changes.As shown in this study, the establishment of enterprises, the change of media, and the internal gap in the media industry all cause differences in the digital transformation of the media industry.(4) There is a gap in the application of empirical research and machine learning methods in existing media economics research, so the application of machine learning in the fields of media economics, journalism, and communication should be continuously promoted.
goodness of fit; in the training set, the model predicts values to the observed values R -sample goodness of fit; in the training set, the model predicts values to the observed values EVS oos Interpretable variance; in the prediction set, the model predicts the degree of fit to the variation of the observed value EVS oos =1-var y − y / var y MSE oos Mean squared error; the expected value of the square between the out-of-sample predicted value and the actual value MSE oos =1/n n i=1 y i − y i 2 MAE oos Average absolute error; the expected value of the difference between the out-of-sample predicted and actual value MAE oos =1/n n i=1 y i − y i 2 MedAE oos Absolute median difference, the median of the absolute difference between out-of-sample predicted and actual values MedAE oos = median of y i − y i Vol.:(0123456789) Scientific Reports | (2024) 14:7286 | https://doi.org/10.1038/s41598-024-57873-7 Social networkThe total number of senior executives working in other enterprises in the corresponding year Top ten holders' rate Share ratio of the top ten shareholders Duality Duality = 1, non-duality = 0IndDirector ratio The proportion of the number of independent directors to the number of the board of directors ROA Return on assets ROA.(Profit for the year/total assets)LevThe ratio of total liabilities to total assets Growth (Operating income of this year/operating income of last year) − 1 TobinQ (Market value of tradable shares + number of non-tradable shares net assets per share + book value of liabilities)/total assets

Figure 2 .
Figure 2. Relative importance ranking based on GBR.

Figure 3 .
Figure 3. Relative importance ranking based on RFR.

Figure 5 .
Figure 5. Partial dependence chart of industry competitiveness.

Figure 6 .
Figure 6.Partial dependence of technical size.

Figure 7 .
Figure 7. Partial dependency of the ROA.

Figure 8 .
Figure 8. Partial dependence on the age of listed companies.

Table 1 .
Model evaluation indicators and calculation methods.
Type of variable Variable name Variable definitionIndustry category Industry name According to the revised version of SHENYIN&WANGUO classification 2021, the media industry is subdivided into six categories of games, advertising marketing, film and television cinema, digital media, publishing, and TV & radio Vol.:(0123456789) Scientific Reports | (2024) 14:7286 | https://doi.org/10.1038/s41598-024-57873-7

Table 9 .
Results of model fitting.

Table 10 .
Prediction performance under different combinations of driving forces. R

Table 12 .
Comparison of industry factors on digital transformation of media companies.

Table 13 .
Test of robustness-Panel A.

Table 14 .
Test of robustness-Panel B.